Fiber#13
Open
daandemeyer wants to merge 65 commits intomainfrom
Open
Conversation
As is often the case, in this case because of alignment, we are actually
not saving any space. With the bitfield we are using one bit of the 8 bytes
allocated, and without the bitfield we are using 8 bits of that.
But we're paying a price in generated code, at every access site to the
field:
$ diff <(objdump -S build/libsystemd.so.old) <(objdump -S build/libsystemd.so.new)
...
v->protocol_upgrade = false;
- fa2d2: 48 8b 45 a8 mov -0x58(%rbp),%rax
- fa2d6: 0f b6 90 90 01 00 00 movzbl 0x190(%rax),%edx
- fa2dd: 83 e2 fe and $0xfffffffe,%edx
- fa2e0: 88 90 90 01 00 00 mov %dl,0x190(%rax)
+ fa2a9: 48 8b 45 a8 mov -0x58(%rbp),%rax
+ fa2ad: c6 80 90 01 00 00 00 movb $0x0,0x190(%rax)
struct sd_varlink: - /* size: 448, cachelines: 7, members: 21 */ + /* size: 432, cachelines: 7, members: 21 */ struct sd_varlink_server: - /* size: 160, cachelines: 3, members: 21 */ + /* size: 152, cachelines: 3, members: 21 */
The intent was good, but we now print two or three of those messages
for each report metrics received on the wire. If the json object is
extensible, then it's all good and we don't need to inundate the user
with this trivial information. (And the message also sounds like
something is wrong or unexpected, when it totally isn't.)
...
(string):1:73: Unrecognized object field 'object', assuming extension.
(string):1:89: Unrecognized object field 'value', assuming extension.
json-stream: Received message: {"parameters":{"name":"io.systemd.Network.CarrierState","object":"virbr0","value":"degraded-carrier"},"continues":true}
(string):1:66: Unrecognized object field 'object', assuming extension.
(string):1:83: Unrecognized object field 'value', assuming extension.
json-stream: Received message: {"parameters":{"name":"io.systemd.Network.CarrierState","object":"lo","value":"carrier"},"continues":true}
(string):1:66: Unrecognized object field 'object', assuming extension.
(string):1:79: Unrecognized object field 'value', assuming extension.
json-stream: Received message: {"parameters":{"name":"io.systemd.Network.CarrierState","object":"wlp0s20f3","value":"carrier"},"continues":true}
(string):1:66: Unrecognized object field 'object', assuming extension.
(string):1:86: Unrecognized object field 'value', assuming extension.
...
Merge the two blocks adding tests, since there seems to be no obvious reason to have two separate blocks, as they both contain tests from the same libraries.
Generic Varlink API for services that hand out file descriptors to storage volumes. Three methods: Acquire() returns an fd for a named volume (optionally creating it from a template), ListVolumes() enumerates available volumes, ListTemplates() enumerates supported creation templates. Volume types follow kernel inode-type naming: blk (block device), reg (regular file), dir (directory). Intent is that multiple providers can sit behind AF_UNIX sockets in a well-known directory and be consumed uniformly by nspawn, vmspawn, the service manager (BindVolume=) and similar tools.
First implementation of io.systemd.StorageProvider, exposing all block devices known to udev (disks, partitions, dm nodes, …) as volumes of type "blk". Names are picked from stable /dev/mapper and /dev/disk/by-* symlinks; content-derived identifiers (by-uuid, by-label, …) are intentionally avoided for security. Volume creation is not supported by this backend. Socket-activated via /run/systemd/io.systemd.StorageProvider/block. Also adds shared storage-util.[ch] (VolumeType / CreateMode helpers) that subsequent providers reuse.
Second StorageProvider implementation, exposing regular files and directories from a backing filesystem. In system mode the backing directory is /var/lib/storage/, in user mode $XDG_STATE_HOME/storage/; entries with a .volume suffix are exposed, with the inode type determining whether the volume is reported as reg, dir or (via symlinked/bind-mounted device node) blk. Unlike the block provider, this one supports creating volumes on-demand from a small set of built-in templates: sparse-file, allocated-file, directory and subvolume.
CLI for inspecting and using storage providers. Scans /run/systemd/io.systemd.StorageProvider/ (or the user-mode equivalent) for AF_UNIX sockets and talks to each one over Varlink. Verbs: "volumes" lists volumes across all providers, "templates" lists supported creation templates, "providers" lists the endpoints themselves. Also installed as a mount.storage helper, so 'mount -t storage PROVIDER:VOLUME /mnt' (or 'mount -t storage.<fstype>' to put a fresh filesystem on a block volume) acquires the volume and mounts it. Ships with bash/zsh completions and a man page.
VM-only test that exercises both shipped providers through storagectl: verifies the well-known sockets exist, lists providers/volumes/ templates, creates and acquires volumes from each template (sparse-file, allocated-file, directory, subvolume), attaches a loop device to cover the block provider, and exercises the mount.storage helper.
Records the still-missing StorageProvider integrations (nspawn, vmspawn, service-manager BindVolume=) and replaces the now-obsolete generic "storage API via varlink" entry with a NetworkProvider proposal modelled on it.
So strv_push_with_size() doesn't have to recalculate the size every time.
The PR to measure into is closely associated with where we place a resource in the initrd cpios. Hence, let's also track it in CpioTarget, thus simplifying our function parameter lists that way. No change in behaviour.
This loads the new 'extra' stanza, but doesn't actually do anything with it yet. That's added in a later commit. Replaces: systemd#39286 Implements: uapi-group/specifications#212
This generates on-the-fly cpio initrds from 'extra' resources declared in Type #1 entries and installs them via the Linux initrd protocol so that they get passed to the Linux kernel. Replaces: systemd#39286
It'll be used in the next commit.
Verb dispatch is left untouched for now. Co-developed-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixup for 8623980. This didn't cause any problems until the conversion away from getopt_long().
--timeout-signal is now documented (fixup for e209926). Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
…pe 1) (systemd#41863) This implements the "extra" stanza for type 1 entries in systemd-boot, see: uapi-group/specifications@bde167a It comes with a really thorough test suite matching our currently level of testing of systemd-boot (read: there is none, I ask you to trust me, Claude, and your review on this one)... Split out of systemd#41543
option_parser_next_arg() is renamed to option_parser_peek_next_arg() to match option_parser_consume_next_arg(). A new helper is added option_parser_get_arg(…, n). It is a common pattern to only need a single arg, and getting an array and extracting a single item from it is too verbose.
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
…to be used by vmspawn/nspawn/pid1 to provide storage volumes in a generic fashion (systemd#41776) BindPath= in unit files, and --bind= in nspawn/vmspawn doesn't really cut it to connect arbitrary storage infra to it. Let's do something about it, and implement a simple, light-weight API for acquiring an fd to a storage volume. Benefits: 1. the interface can be implemented by anyone, connecting anything to vmspawn/nspawn/service management 2. very lose coupling: just bind a socket into a well-known dir, done 3. mounting can happen on-demand
This addresses some trivial points made by @keszybz in the PR review.
This is mostly stuff discussed in systemd#41776.
So strv_push_with_size() doesn't have to recalculate the size every time.
Just small stuff.
…temd#41869) FOREACH_ARRAY declares 'i' as the iterator but the body passed 'd' (the array base) to block_device_done(). Since mfree() leaves the field NULL after the first call, element 0 is freed repeatedly while elements 1..N-1 leak their node, symlinks strv, model, vendor and subsystem. The bug predates the sanitizer-instrumented callers. PR systemd#41776's new systemd-storage-block daemon runs blockdev_list() under ASan/LSan in TEST-87-AUX-UTILS-VM and exposes it (15 allocs / 804 bytes leaked per ListVolumes request). The fix also benefits repart and blockdev_list's internal CLEANUP_ARRAY cleanup. Follow-up for 9f6b274
Follow-up for 6b1324f Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
When /proc is bind-mounted read-only (common in mock/Koji buildroots, containers, and other sandboxed environments), opening /proc/sys/fs/binfmt_misc returns ELOOP if it is an automount point that cannot be triggered in the read-only context. Currently binfmt_mounted_and_writable() only handles ENOENT, so ELOOP propagates as an error. This causes test-binfmt-util to fail with SIGABRT and disable_binfmt() to log a spurious warning at shutdown. Treat ELOOP and EACCES the same as ENOENT: binfmt_misc is not usably available, return false. Note: PR systemd#37006 (merged April 2025) addressed ELOOP in the xstatfsat() path, but the open() call in binfmt_mounted_and_writable() remained unhandled. Fixes systemd#38070
If they are read-only they are no candidates, since we cannot write to them.
Small hygiene fix. r must be >= 0 as per the prior statement (otherwise we would have returned). This is really only going to be r == 0, which means return r; is return 0; I'm updating this to use log_debug_errno
…CLEAR_FUNC() DEFINE_POINTER_ARRAY_CLEAR_FUNC() generates a helper of the form helper_array_clear(T *array, size_t n) that drops each element but does not free the array itself, parallel to DEFINE_POINTER_ARRAY_FREE_FUNC() for cases where the array has automatic storage duration. CLEANUP_ELEMENTS() pairs with these helpers to provide a _cleanup_-like attribute for fixed-size arrays: the bound is taken from ELEMENTSOF(), and the helper is invoked across the elements at scope exit. Compared to CLEANUP_ARRAY(), the storage is neither freed nor zeroed. Migrate various logic across the tree over to the new macros. sd-device: use DEFINE_POINTER_ARRAY_CLEAR_FUNC() for sd_device_unref_array_clear() Replace the local device_unref_many() helper with the macro-generated equivalent. format-table: switch help-table arrays to CLEANUP_ELEMENTS() Generate table_unref_array_clear() via DEFINE_POINTER_ARRAY_CLEAR_FUNC() and convert the help-table arrays in bootctl, cryptenroll, nspawn, repart and vmspawn to CLEANUP_ELEMENTS(). The arrays no longer need a trailing NULL slot, so the size matches ELEMENTSOF() of the groups array. firewall-util: switch netlink message arrays to CLEANUP_ELEMENTS() Generate sd_netlink_message_unref_array_clear() via DEFINE_POINTER_ARRAY_CLEAR_FUNC() in place of the NULL-terminated sd_netlink_message_unref_many(), and convert the two stack arrays of sd_netlink_message pointers to CLEANUP_ELEMENTS().
Let's cap the number of question each query can have to something reasonable - 128 questions per query should be more than enough for any real-world scenario.
Let's start with 1024, as that should be plenty for all sane use cases.
I am really not a fan of full code lines passed to macros as parameters. Let's get rid of the 3rd parameter of FOREACH_OPTION() hence: 1. Let's return errors just as a regular value (though a negative one), that can be handled via a OPTION_ERROR case statement for the switch. This normalizes handling of the error, just like any other event returned by the option parser. 2. In order to avoid exploding the amount of boilerplate in each use (that just propagates the error on OPTION_ERROR), let's then introduce an explicit FOREACH_OPTION_OR_RETURN(), that returns from the calling function on its own (and makes that clear in the name). Together this cleans up, normalizes the logic and shortens the code.
ucontext_t and the makecontext()/swapcontext() family are required by upcoming fiber support, but musl deliberately does not ship them in libc. libucontext provides standalone implementations of these and is the canonical replacement on musl-based distributions. Wire libucontext up as an optional dependency, required when building against musl (where it's mandatory) and opt-in elsewhere. When libucontext is built in freestanding mode (as it typically is on glibc-based distributions that ship it), <libucontext/libucontext.h> collides with <sys/ucontext.h> over REG_R8 and friends, and we can't simply avoid <sys/ucontext.h> because <signal.h> pulls it in unconditionally. Add override headers under src/include/override/ that either forward to libucontext's header (and alias the makecontext family to its libucontext_-prefixed counterparts) or fall through to the system header via #include_next, depending on whether libucontext is enabled. The musl CI workflow gains libucontext-dev so the upcoming fiber code compiles there.
Traditionally, asynchronous programming in systemd has been achieved using sd-event along with the asynchronous interfaces of sd-bus and sd-varlink. This works well when the system is reacting to events and all code triggered by those events can run without blocking. In these scenarios, the global Manager object is passed as userdata to the callback, and the callback can use the stack as usual, declaring local state and ensuring proper cleanup via _cleanup_. Control flow structures, such as loops, work as expected, and everything runs smoothly. However, challenges arise when the code needs to perform long-running operations within these callbacks. Since the system cannot block execution within the callback, we can't directly invoke a long-running operation and wait for its result without introducing complexities. Instead, we need to initiate the long-running task, register for completion with sd-event, sd-bus, or sd-varlink, and provide a callback to be invoked when the operation completes. This callback, however, only receives a single userdata pointer, which forces us to bundle all local variables into a struct and pass it along as part of the callback. On top of that, after queuing the asynchronous operation, the caller continues executing. As the caller's stack unwinds when the function exits, the resources and state within the local scope may be prematurely cleaned up. Therefore, the struct must store copies of the local variables or ensure proper reference counting to prevent premature resource cleanup. When multiple long-running operations need to be initiated within a loop, the complexity grows further. We must introduce additional shared state to track the completion of all operations before we can run any code that depends on their results. Furthermore, since the daemon may be shut down at any time, we must track the lifecycle of each long-running operation in the global Manager struct, ensuring proper cleanup even when stack unwinding can no longer manage the resources for us. Fibers, or green threads, provide a more natural way of handling asynchronous operations. By enabling cooperative multitasking within a single thread, fibers allow us to write code that looks like it’s running synchronously, but with the ability to yield control at predefined points, such as when waiting for long-running tasks to complete. With fibers, we can simplify the control flow by running asynchronous operations within a fiber, allowing us to "pause" execution while waiting for the long-running operation to finish and then "resume" the operation once it's complete. This eliminates the need for multiple callback chains, extensive state tracking, and the potential pitfalls of stack unwinding. This commit introduces the ability to execute long-running operations in a non-blocking manner while maintaining the simplicity and readability of synchronous code. The fiber-based approach will significantly improve the handling of complex workflows, making the code easier to write and maintain. The implementation is based on ucontext.h and sd-event. ucontext.h provides us with alternate stacks that we can switch between. The default stack size is the same as a regular thread. Because we use mmap() to allocate the stack, the memory won't actually be used until it is paged in by the kernel, so we don't actually use 8MB per fiber. To integrate fibers with the event loop, each fiber is assigned a deferred event source which resumes the fiber when enabled. The deferred event source is oneshot by default so the fiber will run immediately until it yields or suspends. If it yields, the deferred event source is enabled again (oneshot) immediately. If it suspends, before it suspends, one or more event sources are registered with sd-event that will enable the deferred event source (oneshot) to resume the fiber once the operation it is waiting for completes. Yielding or suspending the fiber is done by calling sd_fiber_yield() or sd_fiber_suspend() respectively. Both of these return zero on success or any error value from the async operation that caused the fiber to resume. This is also how fiber cancellation is implemented. When a fiber is cancelled, sd_fiber_yield() and sd_fiber_suspend() will return ECANCELED when the fiber is resumed, allowing the fiber to unwind its stack (which allows cleanup to happen automatically) and finish. Instead of having applications work directly with fibers, we hide them behind a generic futures interface to represent long-running operations, regardless of whether those operations are running on a fiber or not. Aside from fibers, the futures library (sd-future) allows waiting for sd-event sources and doing sd-bus calls in the background as well. Fibers can suspend until a future is ready with sd_fiber_await(). The futures library has two sides. sd_future is the read side that consumers hold and inspect; sd_promise is the embedded write side that producers use to resolve it. The two share storage — the promise is a member of the future — and producers recover the wrapping future from a promise via container_of(). Each future kind plugs into the library by providing an sd_future_ops vtable (free, cancel, set_priority) and an opaque implementation struct via sd_future_new(). The library treats the impl as a black box; the only constraint is that its first field must be sd_promise*, which sd_future_new() stamps with a back-pointer to the wrapping future. This lets handlers (e.g. an sd-event IO callback) resolve the future from just the impl pointer without having to keep a separate sd_future* around, and keeps producers small — the IO future, time future, and bus-call future each fit in roughly fifty lines. A future starts in SD_FUTURE_PENDING and transitions exactly once to SD_FUTURE_RESOLVED, carrying an integer result. Consumers can react to that transition either by installing a one-shot callback with sd_future_set_callback() (callback-style code) or by waiting on it from a fiber via sd_fiber_await() (synchronous-looking fiber code). sd_fiber_await() is itself built on a "wait future" that resolves when its target resolves; sd_future_new_wait() exposes the same primitive directly so non-fiber callers can chain futures without involving a fiber. Cancellation is cooperative: sd_future_cancel() invokes the impl's cancel callback, which is responsible for tearing down its work and ultimately resolving the promise with -ECANCELED. For fiber futures this is what surfaces as the ECANCELED return from sd_fiber_yield()/sd_fiber_suspend() mentioned above. Fire-and-forget fibers — created by passing a NULL ret to sd_fiber_new() — take a self-reference on their future so they outlive the caller's scope. The self-ref is dropped when the fiber resolves. This floating mechanism (sd_fiber_set_floating()) is restricted to fiber futures because they uniquely guarantee resolution; allowing it for arbitrary future kinds would risk silent leaks for kinds that may never resolve. Note that fiber cleanup depends on the runtime operating normally. Each fiber's _cleanup_-style cleanups live on the fiber's own stack and run only when the fiber is resumed and allowed to unwind, which requires a working event loop to drive it to completion. The exit event source registered for top-level fibers ensures unwind on a normal sd_event_exit(), but if the event loop itself terminates abnormally (e.g. an unrecoverable allocation failure mid-dispatch) before all fibers have resolved, their stacks never unwind and any resources they own leak. This is a structural property of stackful coroutines, shared with libraries like Boost.Coroutine and libdill; for resources where leaking is unacceptable, callers must arrange explicit teardown rather than relying solely on fiber-stack cleanup. The code lives in libsystemd as sd-future (not exported) for the following reasons: - We may want to make this a public libsystemd API in the future - The code can't live in src/basic as it makes heavy use of sd-event - The code can't live in src/shared as sd-bus and sd-event make use of it The basic fiber definitions do live in src/basic as we need them in log-context.c and log.c to give each fiber its own log context instead of every fiber operating on the thread global log context.
Add a family of sd_fiber_*() I/O wrappers that, when called from a
fiber, behave like blocking I/O from the caller's perspective but
yield to the event loop instead of blocking the thread:
sd_fiber_read / sd_fiber_write
sd_fiber_readv / sd_fiber_writev
sd_fiber_recv / sd_fiber_send
sd_fiber_connect
sd_fiber_recvmsg / sd_fiber_sendmsg
sd_fiber_recvfrom / sd_fiber_sendto
sd_fiber_accept
sd_fiber_poll
All of them share a single helper, fiber_io_operation(), which when
invoked outside a fiber falls through to the underlying syscall
directly, preserving the regular blocking behaviour. Inside a fiber
the helper flips the fd to non-blocking (restoring its original mode
on return), tries the syscall once on the fast path, and on EAGAIN/
EWOULDBLOCK creates an sd-event-backed IO future via future_new_io(),
suspends the fiber, and retries the syscall once the event source
fires. Errors propagate as negative errno values, matching the
convention of other sd-* APIs.
future_new_io() itself is added to sd-event/event-future.{c,h} as a
new IoFuture kind. It wraps sd_event_add_io() into an sd_future:
oneshot enable, EPOLLERR translated via SO_ERROR (suppressed for
non-sockets), and the fd duplicated with F_DUPFD_CLOEXEC to avoid
EEXIST when multiple sources watch the same descriptor. Cancellation
disables the source and resolves the promise with -ECANCELED. It's
the same pattern as the time and child future kinds added in the
previous commit.
Together these let fiber-using code write straight-line socket and
pipe I/O without bundling state into callbacks. Tests covering the
fast path, suspend-and-retry path, fallback-when-not-on-a-fiber path,
cancellation while suspended, blocking-mode preservation, and shared
fd / multiple-fiber scenarios live in test-fiber-io.c.
Some helpers in src/basic — ppoll_usec_full() (used by fd_wait_for_event()), loop_read(), loop_read_exact(), loop_write_full() and pidref_wait_for_terminate_full() — block the calling thread. That's the right behaviour outside a fiber but not inside one, where blocking the thread also stalls every other fiber running on the same event loop. Rewriting every caller to pick a fiber or non-fiber variant explicitly would be a lot of churn and would split otherwise-shared code paths in two. Instead, the helpers detect at runtime whether they're running on a fiber and dispatch to a suspending variant when they are. FiberOps in fiber-def.h holds five function pointers (ppoll, read, write, timeout, timeout_done); each Fiber stores a pointer to a const FiberOps that sd_fiber_new() populates with sd_fiber_poll/sd_fiber_read/sd_fiber_write/ sd_fiber_timeout/sd_future_unref so the suspending implementations themselves stay in libsystemd. FIBER_OPS_FORWARD() temporarily clears the ops pointer around the dispatched call so the op's body can reuse the non-redirected helpers without recursing. - ppoll_usec_full() uses FIBER_OPS_FORWARD() at the top to tail-call the ppoll hook when on a fiber, otherwise falls through to the normal ppoll() body. ss must be NULL on a fiber since sd_fiber_poll() doesn't take a sigmask. - loop_read()/loop_read_exact() call the read hook directly when on a fiber and the fd is blocking, which suspends on EAGAIN until data is available — making the do_poll knob and the explicit fd_wait_for_event() retry loop unnecessary in that path. When the fd is already non-blocking and do_poll is false the original return-EAGAIN-immediately semantic is preserved by falling through to the read() path. - loop_write_full() likewise calls the write hook inside a FIBER_OPS_WITH_TIMEOUT() scope so the caller's timeout is honoured via a deadline future, mirroring SD_FIBER_TIMEOUT() but reachable from src/basic without pulling in sd-future.h. The timeout==0 fast-return-EAGAIN semantic is preserved the same way. - pidref_wait_for_terminate_full() polls the pidfd via fd_wait_for_event() before each waitid() when either a finite timeout is set or we're on a fiber, and requires pidref->fd >= 0 in those cases (returning -ENOMEDIUM otherwise — extending the rule that already applied to finite timeouts). The poll suspends the fiber via the ppoll hook above; the subsequent waitid() doesn't block because the pidfd is already signalled. Tests in test-fiber-ops.c cover the suspending paths for these helpers, the cooperative-scheduling ordering they enable across multiple fibers, and the fall-through-to-blocking behaviour when called outside any fiber.
…iber sd_event_run() blocks the calling thread on the event loop's epoll fd until something happens. When the caller is a fiber, that's the wrong behaviour: blocking the thread also stalls every other fiber and the outer event loop driving them. The most common way to hit this is a fiber that creates its own inner event loop (e.g. a server-style fiber that wants to dispatch its own sources independently of whatever loop the test or supervising fiber is running on) — with the existing implementation the inner sd_event_run() would hold the thread while the outer scheduler should be free to advance other fibers. Add an event_run_suspend() variant in sd-event/event-future.c that performs the same prepare/wait/dispatch dance, but when the fast path finds nothing ready it (a) creates an IO future watching the inner event loop's epoll fd on the *outer* event loop, (b) optionally creates a time future for the timeout, and (c) suspends the fiber. When either future fires the fiber is resumed and the prepare/wait/dispatch sequence runs once more to actually dispatch what's pending. sd_event_run() checks sd_fiber_is_running() and delegates to this variant when on a fiber; profile_delays accounting is intentionally skipped on that path since the underlying prepare/wait/dispatch primitives already account for themselves. PROTECT_EVENT() moves from sd-event.c into a new event-util.h so it can be reused by event_run_suspend() without exporting it as a libsystemd symbol. test-event-future.c covers the suspending paths: zero-timeout fast return, immediately-pending IO, IO arriving during suspension, timer firing during suspension, repeated short-timeout calls (the post-error SD_EVENT_ARMED state regression), and a nested fiber-driven inner event loop running concurrently with an outer timer.
Three changes to teach sd-bus how to behave when called from a fiber, in
order of increasing depth:
1. bus_poll() now uses sd_fiber_ppoll() instead of ppoll_usec(). On the
non-fiber path that's a transparent fall-through; on a fiber it
suspends instead of blocking the thread, so other fibers and the
surrounding event loop keep running while the bus waits for I/O.
2. sd_bus_call() now redirects to a new bus_call_suspend() helper when
the caller is a fiber whose event loop is the same one the bus is
attached to. The plain bus_poll() path serializes all bus traffic on
the slot's reply (only one method call can be in flight per
sd_bus*), which would defeat the point of running multiple fibers
against one bus. bus_call_suspend() builds on the async sd-bus API:
it wraps the call in a new BusFuture (sd-bus/bus-future.{c,h}) that
resolves when the reply or method-error arrives, lets the fiber
await that future, and surfaces the reply to the caller via
future_get_bus_reply(). Because the futures live on the event loop
rather than a per-bus slot, multiple fibers can drive concurrent
method calls against the same bus.
3. A new private SD_BUS_VTABLE_METHOD_FIBER flag dispatches a vtable
method handler on its own fiber, so handlers are free to use
sd_bus_call() against the same bus, sd_fiber_sleep(), loop_read(),
etc. without stalling the event loop for other connections or
handlers. The flag stays out of sd-bus-vtable.h (its bit value is
reserved there to prevent collisions) — the fiber runtime is a
systemd-internal implementation detail.
Lifecycle of fiber-dispatched handlers is tracked on the bus itself: a
new bus->fiber_futures set holds a ref to each in-flight handler.
bus_enter_closing() cancels every entry and process_closing() returns
with the bus still in CLOSING state until the set drains, so we can be
sure no fiber handler outlives the bus. bus_fiber_resolved() removes
the entry on completion. bus_free()'s assert(set_isempty()) makes the
invariant load-bearing.
To exercise these changes the existing thread-based client/server
sd-bus tests (test-bus-chat, test-bus-objects, test-bus-peersockaddr,
test-bus-server, test-bus-watch-bind) are migrated to fibers, and a
new test-bus-fiber is added that covers SD_BUS_VTABLE_METHOD_FIBER —
including handlers that issue nested sd_bus_call() on the same bus, the
cancel-on-close path, and concurrent dispatches across multiple fibers.
Three changes, in increasing depth:
1. json_stream_wait() now uses sd_fiber_ppoll() instead of
ppoll_usec(). On the non-fiber path that's a transparent
fall-through; on a fiber it suspends instead of blocking the thread.
Because all of varlink's synchronous client paths
(sd_varlink_wait(), sd_varlink_call(), sd_varlink_collect()) drive
their I/O through json_stream_wait(), this change alone makes them
safe to call from a fiber.
2. Add varlink_server_bind_fiber() and varlink_server_bind_fiber_many()
in varlink-util.{c,h} for registering a method handler that should
run on a dedicated fiber per dispatch. The fiber-bound methods live
in a separate s->fiber_methods map alongside the regular s->methods;
bind_internal()/bind_many_internal() are factored out so the regular
and fiber bind variants share their parsing/insertion code.
Registering the same method in both maps is rejected because the
dispatcher consults the regular map first and would otherwise
silently shadow the fiber binding.
3. varlink_dispatch_fiber() builds a VarlinkFiberData (refs to the
connection, parameters, and method name), spawns a fiber via
sd_fiber_new(), and makes the future floating so the fiber
self-manages its lifetime — neither the dispatcher nor the
connection has to track it. The fiber's priority is set to one
below the connection's quit event source so that on graceful
shutdown the fiber's exit handler fires (and runs its cleanup)
before varlink's quit_callback() closes the connection underneath
it; this is what lets a fiber-bound handler reply or flush its
sentinel on a still-open connection during shutdown.
The connection state transitions are reordered so they happen before
the fiber spawn rather than after the synchronous callback returns:
the fiber runs after dispatch has already moved past PROCESSING, which
matches the behaviour expected for a deferred reply (the fiber may
either reply immediately, or stash the connection and reply later, in
which case the post-callback logic treats it as a PENDING_METHOD).
The client/server varlink tests are migrated to fibers (threads → mock
server fibers on the same event loop) to exercise the new paths.
The synchronous qmp_client_call() pumps the event loop until its reply arrives, pinning the parsed reply on c->current so it can hand out borrowed pointers to the caller. That model only fits one in-flight sync call: a second qmp_client_call() on the same client clears c->current before issuing its own send, invalidating the first caller's borrowed pointers. On a single-threaded event loop that was fine, but with fibers two concurrent calls on the same client can interleave through the pump (json_stream_wait() suspends the running fiber) and trample each other. Add three entry points: - qmp_client_call_future(): the async building block. Returns an sd_future backed by a QmpFuture impl that owns the reply variant and a strdup'd error_desc. The reply callback resolves the promise; cancellation drops the pending slot so a late reply doesn't fire into freed memory and resolves the promise with -ECANCELED. - future_get_qmp_reply(): borrowed-lifetime extraction from a resolved future. The pointers stay valid until the future is freed. - qmp_client_call_suspend(): the convenience wrapper for fibers. Issues the call via qmp_client_call_future(), suspends the fiber, then surfaces result and error_desc through the same borrow contract as qmp_client_call(): valid until the next qmp_client_call*() on this client. The contract is implemented by pinning the resolved future on the client (current_call_future) and unref'ing the previous one on entry. Because the per-call state lives on the future rather than on a single c->current slot, multiple fibers can have their own in-flight calls on the same client without clobbering each other. Then make qmp_client_call() detect when it's running on a fiber whose event loop matches the client and transparently delegate to qmp_client_call_suspend(), so existing call sites become safe under concurrent fibers without source changes. To make this work concurrently, we also change qmp_client_call() to hand out references and copies of errors so that we don't have to store the borrowed pointers we hand out in the QmpClient struct.
The mock servers used to be driven out-of-band: each test created a
socketpair, forked a child, ran a hand-coded request/response script
against the raw fd, and sent SIGTERM to tear it down. That worked but
required pidref/process-util/signal plumbing in every test, two
distinct execution contexts that couldn't share state, and a JsonStream
attached to the mock side that pretended to be event-loop-driven while
actually being driven manually via blocking reads.
Now that JsonStream exposes suspending helpers, the mocks can live
inside the same process and event loop as the client. Each mock is
rewritten as an sd-fiber that runs alongside the client fiber: the
JsonStream uses the suspending json_stream_wait()/flush() variants,
so the mock fiber yields on I/O and the event loop schedules the
client in the meantime. Both sides progress cooperatively, no
fork/SIGTERM/PID tracking, no manual phase tracking.
Two cleanups fall out of the rewrite:
- A QMP_TEST(name, mock_fn) { ... } macro encapsulates the per-test
scaffolding (event loop, socketpair, mock fiber spawn, exit-on-idle
shim) and injects an already-connected QmpClient *client into the
test body. Each test now reads as a flat sequence of
qmp_client_call() invocations against that client.
- Repeated mock command/reply scripting is factored into
mock_qmp_expect(), mock_qmp_reply(), mock_qmp_expect_and_reply(),
mock_qmp_handshake(), and mock_qmp_query_status_running(). The
greeting JSON is built with sd_json_buildo() instead of being parsed
from a literal.
The file shrinks from 756 to 494 lines, mostly through deletions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.